Skip to content

quic: make multiple improvements to packet#62589

Open
jasnell wants to merge 4 commits intonodejs:mainfrom
jasnell:jasnell/quic-packet-improvements
Open

quic: make multiple improvements to packet#62589
jasnell wants to merge 4 commits intonodejs:mainfrom
jasnell:jasnell/quic-packet-improvements

Conversation

@jasnell
Copy link
Copy Markdown
Member

@jasnell jasnell commented Apr 4, 2026

Previously Packets were ReqWrap objects with a shared free-list. This commit changes to a per-Endpoint arena with no v8 involvement. This is the design I originally had in mind but I initially went with the simpler freelist approach to get something working. There's too much overhead in the reqrap/freelist approach and individual packets do not really need to be observable via async hooks.

This design should eliminate the risk of memory fragmentation and eliminate a significant bottleneck in the hot path.

Summary of improvements:

Memory Comparison

Metric Before After Delta
Per-packet memory ~2,140 bytes 1,712 bytes -20%
Heap allocations per acquire 3-4 (Packet, Data, shared_ptr control, V8 object) 0 (pre-allocated in block) eliminated
Heap allocations per reuse (freelist hit) 2 (Data, shared_ptr control) 0 eliminated
V8 heap per packet ~200-400 bytes (JS object) 0 eliminated
Block allocation (128 slots) N/A 214 KB (one new char[]) amortized across 128 acquires
Per-packet allocator overhead ~48-96 bytes (malloc headers × 3-4 allocs) 0 (inline in block) eliminated

Fragmentation

Before: Each packet reuse from the freelist still called std::make_shared(length, label) — a new heap allocation for the Data object + its shared_ptr control block + the std::string diagnostic label. These are small, variably-sized allocations scattered across the heap.

After: All slots are identical 1,712-byte regions within contiguous 214 KB blocks. Zero per-packet heap allocations during steady-state operation. The only allocations happen when a new block is grown.

Performance Comparison

Acquire (hot path — called up to 32× per SendPendingData)

Before (freelist hit):

  1. BindingData::Get(env) — resolve binding data from environment
  2. packet_freelist.front() / pop_front() — std::list dereference (random memory access)
  3. std::make_shared(length, label) — heap allocate Data + control block
  4. std::string constructor for diagnostic label — potential heap allocation
  5. Set listener, destination, data pointer on Packet

Before (freelist miss):

  1. JS_NEW_INSTANCE_OR_RETURN — allocate V8 JS object (GC pressure, potentially triggers GC)
  2. MakeBaseObject(...) — heap allocate Packet
  3. std::make_shared(...) — heap allocate Data + control block
  4. ClearWeak() — modify V8 weak handle state

After (always):

  1. Pop from intrusive free list — slot = free_list_; free_list_ = slot->next_free; (2 pointer ops)
  2. Increment in_use_count_ on block and pool (2 increments)
  3. Placement new Packet in pre-allocated memory (zero-initializes uv_udp_send_t, copies SocketAddress)

The new acquire is essentially 2 pointer operations + a placement new. No heap allocation, no V8 involvement, no atomic operations (shared_ptr control block had atomics).

Release (send callback — every completed packet)

Before:

  1. BaseObjectPtr construction from raw pointer — atomic increment
  2. MakeWeak() — modify V8 weak handle
  3. Check IsDispatched(), call listener
  4. data_.reset() — atomic decrement on shared_ptr, may free Data
  5. Reset() — reset uv_udp_send_t state
  6. packet_freelist.push_back(std::move(self)) — std::list node allocation (!)
  7. Or if freelist full: destroy Packet → V8 GC eventually collects JS object

After:

  1. Packet::FromReq(req) — ContainerOf pointer arithmetic (compile-time offset)
  2. Call listener
  3. ArenaPool::Release(p) — ~Packet() (trivial), then ReleaseSlot:
    • Pointer arithmetic to recover SlotHeader
    • slot->next_free = free_list_; free_list_ = slot; (2 pointer ops)
    • Decrement 2 counters
    • MaybeGC() check (branch, rarely taken)

The new release is pointer arithmetic + 2 pointer operations + 2 decrements. No atomic operations, no heap free, no V8 interaction.

Send path (UDP::Send)

Before: ClearWeak() + Dispatched() + uv_udp_send() + on error: Done() + MakeWeak()

After: Ptr::release() (1 pointer swap) + uv_udp_send() + on error: ArenaPool::Release()

SendPendingData loop (up to 32 packets per call)

Before: Each iteration potentially triggered JS_NEW_INSTANCE_OR_RETURN (V8 object allocation) on freelist miss, plus std::make_shared on every iteration.

After: Each iteration is just a free list pop + placement new. For a full 32-packet burst from a warm pool, this is ~32 × (2 pointer ops + a memset/memcpy for the Packet fields) — essentially zero allocation cost.

GC pressure

Before: Each Packet had a persistent V8 JS object. When the freelist was full (>100 packets), excess packets were destroyed, leaving their V8 objects for the garbage collector. Under high throughput, this created ongoing GC pressure proportional to packet churn.

After: Zero V8 objects. Zero GC pressure from packets. The ArenaPool::MaybeGC() only runs when >50% of total slots are free and only frees entire blocks — a rare bulk operation, not per-packet work.

Summary

Aspect Improvement
Per-packet memory ~20% smaller (1,712 vs ~2,140 bytes)
Heap fragmentation Eliminated (contiguous block allocation)
Heap allocations per acquire 0 (was 2-4)
V8 GC pressure Eliminated entirely
Atomic operations per acquire/release 0 (was 2+ from shared_ptr)
Cache locality Improved (sequential slots in contiguous blocks)
Acquire cost ~2 pointer ops (was: conditional heap alloc + V8 object + shared_ptr)
Release cost ~4 pointer ops + 2 decrements (was: atomic decrement + V8 weak handle + list node alloc)
SendPendingData 32-packet burst ~32 × pointer swap (was: 32 × potential heap alloc + V8 alloc)
Steady-state memory overhead Fixed: 1 block = 214 KB for 128 slots (was: unbounded individual allocations)

The biggest wins are eliminating the per-packet V8 object allocation (which could trigger GC) and the shared_ptr atomic operations on every acquire/release. For a high-throughput QUIC session sending 32 packets per SendPendingData call, the new path is essentially allocation-free after the first block is populated.

Signed-off-by: James M Snell jasnell@gmail.com
Assisted-by: Opencode:Opus 4.6

@jasnell jasnell requested a review from Qard April 4, 2026 18:00
@nodejs-github-bot
Copy link
Copy Markdown
Collaborator

Review requested:

  • @nodejs/gyp

@nodejs-github-bot nodejs-github-bot added c++ Issues and PRs that require attention from people who are familiar with C++. lib / src Issues and PRs related to general changes in the lib or src directory. needs-ci PRs that need a full CI run. labels Apr 4, 2026
jasnell added 2 commits April 4, 2026 11:04
Previously Packets were ReqWrap objects with a shared
free-list. This commit changes to a per-Endpoint arena
with no v8 involvement. This is the design I originally
had in mind but I initially went with the simpler
freelist approach to get something working. There's
too much overhead in the reqrap/freelist approach and
individual packets do not really need to be observable
via async hooks.

This design should eliminate the risk of memory fragmentation
and eliminate a significant bottleneck in the hot path.

Signed-off-by: James M Snell <jasnell@gmail.com>
Assisted-by: Opencode:Opus 4.6
Handful of additional improvements to the Packet class.

Signed-off-by: James M Snell <jasnell@gmail.com>
Assisted-by: Opencode:Opus 4.6
@jasnell jasnell force-pushed the jasnell/quic-packet-improvements branch from ead61aa to 66d349a Compare April 4, 2026 18:05
Signed-off-by: James M Snell <jasnell@gmail.com>
@jasnell jasnell force-pushed the jasnell/quic-packet-improvements branch from 66d349a to da1e78a Compare April 4, 2026 18:06
@jasnell jasnell requested a review from mcollina April 4, 2026 18:07
Signed-off-by: James M Snell <jasnell@gmail.com>
@codecov

This comment was marked as outdated.

@nodejs-github-bot

This comment was marked as outdated.

@nodejs-github-bot

This comment was marked as outdated.

Copy link
Copy Markdown
Member

@mcollina mcollina left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@nodejs-github-bot
Copy link
Copy Markdown
Collaborator

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

c++ Issues and PRs that require attention from people who are familiar with C++. lib / src Issues and PRs related to general changes in the lib or src directory. needs-ci PRs that need a full CI run.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants